Entity Extraction Without Language-Speci c Resources
نویسندگان
چکیده
We describe a named-entity tagging system that requires minimal linguistic knowledge and thus may be applied to new target languages without signiicant adaptation. To maintain a language-neutral posture, the system is linguistically na ve, and in fact, reduces the tagging problem to supervised machine learning. A large number of binary features are extracted from labeled data to train classiiers and compu-tationally expensive features are eschewed. We have initially focused our attention on linear support vectors machines (SVMs); SVMs are known to work well when a large number of features is used as long as the individual vectors are sparse. We call our system SNOOD (Hop-kins APL Inductive Retargetable Named Entity Tagger).
منابع مشابه
A Named Entity Extraction System and its Web extensions
In this work we describe a Named Entity Extraction system originally developed within the scope of the EU-funded FACILE project, and currently used within the CONCERTO project. The system has been rstly tested at the MUC-7 competition. The purpose of the system is to identify and classify proper names in free text. In the FACILE project these were mainly nancial news, however the system and res...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملUnsupervised concept based entity extraction from scientific titles
is paper studies the extraction and typing of entities from titles of academic literature, in order to gain a deeper understanding of their specic contributions and automate the construction of a problem-solution knowledgebase. To achieve this goal, we propose an unsupervised, domain independent, two phase algorithm to extract entity mentions and type them into appropriate concepts. In the r...
متن کاملAnnotating and Recognizing Event Modality in Text
Current results in basic Information Extraction tasks such as Named Entity Recognition or Event Extraction suggest that we are close to achieving a stage where the fundamental units for text understanding are put together; namely, predicates and their arguments. However, other layers of information, such as event modality, are essential for understanding, since the inferences derivable from fac...
متن کاملTowards Heterogeneous Resources-Based Ambiguity Reduction of Sub-typed Geographic Named Entities
The aim of this work is to nd sub-typed Geographic Named Entities from the analysis of relations between Place Names surrounded nominal group within a speci c phrasal context in a set of textual documents. The paper presents a method involving natural language processing and heterogeneous resources like gazetteers, thesauri or ontologies. The work and the results focus a French language corpus....
متن کامل